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ABSTRACT 

In  existing  paper  they  had  used  a  full  month  of 
Twitter  data  to  evaluate  SEISMIC  .In  which  the 
original  data  set  contains  over  3.2  billion  tweets  and 
retweets  on  Twitter  from  Octobor  7  to  November  7, 
2011. Also  they  only  kept  tweets  such  that  it  has  at 
least  50  retweets,  the  text  of  the  tweet  does  not 
contain  a  pound  sign  #  (hashtag),  and  the  language  of 
the  original  poster  is  English.  There  are  166,076 
tweets  satisfying  these  criteria  in  the  end.  So  here  we 
are  going  to  propose  the  mining  of  tweets  with  a 
particular  #hashtags  and  going  to  formulate  the 
number  of  retweets  in  an  efficient  manner  ,so  that  it 
will  be  more  efficient  in  terms  of  organizing  particular 
categories  while  mining  the  popularity  of  retweets. 

Keywords:  Information  diffusion;  cascade 

prediction;  self-exciting  point  process;  contagion; 
social  media 

INTRODUCTION 

Online  social  networking  services,  such  as  Facebook, 
Youtube  and  Twitter,  allow  their  users  to  post  and 
share  content  in  the  form  of  posts,  images,  and  videos 
As  a  user  is  exposed  to  posts  of  others  she  follows,  the 
user  may  in  turn  reshare  a  post  with  her  own 
followers,  who  may  further  reshare  it  with  their 
respective  sets  of  followers.  This  way  large 
information  cascades  of  post  resharing  spread  through 
the  network. 


A  fundamental  question  in  modeling  information 
cascades  is  to  predict  their  future  evolution.  Arguably 
the  most  direct  way  to  formulate  this  question  is  to 
consider  predicting  the  final  size  of  a  information 
cascade.  That  is,  to  predict  how  many  reshares  a  given 
post  will  ultimately  receive. 

Our  model  gives  only  15%  relative  error  in  predicting 
final  size  of  an  average  information  cascade  after 
observing  it  for  just  one  hour. 

LITERATURE  REVIEW: 

The  study  of  information  cascades  is  a  rich  and  active 
field.  Recent  models  for  predicting  size  of  information 
cascades  are  generally  characterized  by  two  types  of 
approaches,  feature  based  methods  and  point  process 
based  methods.  The  process  of  adopting  new 
innovations  has  been  studied  for  over  30  years,  and 
one  of  the  most  popular  adoption  models  is  described 
by  Diffusion  of  Innovations.  Much  research  from  a 
broad  variety  of  disciplines  has  used  the  model  as  a 
framework  mentioned  several  of  these  disciplines  as 
political  science,  public  health,  communications, 
history,  economics,  technology,  and  education,  and 
defined  Rogers’  theory  as  a  widely  used  theoretical 
framework  in  the  area  of  technology  diffusion  and 
adoption.  [1] 
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It  is  widely  accepted  that  some  time  after  the 
occurrence  of  a  major  earthquake  the  aftershock 
activity  dies  off  and  background  activity  surpases  the 
aftershock  activity.  Prior  to  the  next  major 
earthquake,  preseismic  quiescence  and  then 
foreshocks  are  expected  to  appear  in  the  focal  region. 
Thus  the  seismic  quiescence  and  related  seismic  gap 
have  been  studied  by  many  seismologists  for  the 
purpose  of  earthquake  predictions,  from  this  it  is  also 
useful  for  the  prediction  of  the  the  information 
cascade  of  future  prediction  of  final  popularity  of 
retweets.  [2] 

Social  network  services  have  become  a  viable  source 
of  information  for  users.  In  Twitter,  information 
deemed  important  by  the  community  propagates 
through  retweets.  Studying  the  characteristics  of  such 
popular  messages  is  important  for  a  number  of  tasks, 
such  as  breaking  news  detection,  personalized 
message  recommendation,  viral  marketing  and  others. 
We  cast  the  problem  of  predicting  the  popularity  of 
messages  into  two  classification  problems:  1)  a  binary 
classification  problem  that  predicts  whether  or  not  a 
message  will  be  retweeted,  and,  2)  a  multi-class 
classification  problem  that  predicts  the  volume  of 
retweets  a  particular  message  will  receive  in  the  near 
future.  [3] 

Modeling  and  predicting  retweeting  dynamics  in 
social  me-dia  has  important  implications  to  an  array 
of  applications.  Existing  models  either  fail  to  model 
the  triggering  e 

ect  of  retweeting  dynamics,  e.g.,  the  model  based  on 
reinforced  Poisson  process,  or  are  hard  to  be  trained 
using  only  the  retweeting  dynamics  of  individual 
tweet,  e.g.,  the  model  based  on  self-exciting  Hawkes 
process. Our  model  is  motivated  by  the  observation 
that  the  retweeting  process  of  tweets  could  be 
generally  characterized  by  a  di_usion  tree  with  only  a 
handful  of  key  nodes,  each  triggering  a  high  number 
of  retweets. [4] 

Retweeting  is  the  key  mechanism  for  information 
diffusion  in  Twitter.  It  emerged  as  a  simple  yet 
powerful  way  of  disseminating  information  in  the 
Twitter  social  network.  One  interesting  emergent 
behavior  in  Twitter  is  the  practice  of  retweeting, 
which  is  the  relaying  of  a  tweet  that  has  been  written 
by  another  Twitter  user.  This  can  be  done  in  one  of 
two  ways.  First,  one  can  retweet  by  preceding  it  with 
RT  and  addressing  the  original  author  with  @.  For 
example,  “RT  @userA:  my  experience  with  the  new 


#iPad  is  great!”  Second,  Twitter  also  enables  users  to 
retweet  easily  with  one-click.  [5] 

EXISTING  SYSTEM: 

In  present  they  had  used  a  full  month  of  Twitter  data 
to  evaluate  SEISMIC  .In  which  the  original  data  set 
contains  over  3.2  billion  tweets  and  retweets  on 
Twitter  from  Octobor  7  to  November  7,  2011. Also 
they  only  kept  tweets  such  that  it  has  at  least  50 
retweets,  the  text  of  the  tweet  does  not  contain  a 
pound  sign  #  (hashtag),  and  the  language  of  the 
original  poster  is  English.  There  are  166,076  tweets 
satisfying  these  criteria  in  the  end 

PROPOSED  SYSTEM: 

So  here  we  are  going  to  propose  the  mining  of  tweets 
with  a  particular  #hashtags  and  going  to  formulate 
the  number  of  retweets  in  an  efficient  manner,  so 
that  it  will  be  more  efficient  in  terms  of  organizing 
particular  categories  while  mining  the  popularity  of 
retweets. 

METHODOLOGY: 

In  this  project  we  are  going  to  formulate  the  total 
number  of  retweets  i.e,  the  popularity  of  the  tweets 
with  the  help  of  SEISMIC  (self  exciting  point 
process)  algorithm  .Initially  with  the  help  of  the 
twitter  application  we  made  an  authentication,  with 
the  help  of  Rstudio.  In  which  it  requires  the  following 
packages  such  as  Seismic,  twitter,  devtools.  After 
Installing  necessary  packages,  then  we  have  to 
retrieve  the  data  from  the  twitter,  generally  we  can 
search  tweets  in  twitter  using 
searchtwitteR(“example”).  Then  by  defining  the 
retweet  pattern  we  were  going  to  fetch  the  tweets  that 
has  been  retweeted,  over  a  period  of  time  with  the 
certain  attributed  such  as,  tweeted,  time,  date,  is 
retweeted.  The  major  methodology  is  that  we  were 
retrieving  the  data  with  hashtags,  by  mentioning  the 
number  of  tweets  that  has  to  be  retrieved.  So  After 
retrieving  the  tweets  we  are  going  to  store  it  in  a  data 
frame  to  write  it  to  a  csv  file.  After  the  completion 
now  we  could  be  able  to  analyze  the  content 
graphically  in  exploratory  with  the  total  number  of 
estimated  parameters.  The  sequential  process  of  the 
methodology  is  given  with  the  flow  diagram  below. 


@  IJTSRD  I  AvailableOnline@www.ijtsrd.coml  Volume -1  I  Issue  -  5  I  July- Aug  2017 


Page:  797 


International  Journal  of  Trend  in  Scientific  Research  and  Development  (IJTSRD)  ISSN:  2456-6470 


Creation  of  twitter 
Application 


Fetching  data 


Authentication 


□□O 

-ODD 

□OP 

Defining  Retweet 
pattern 


Retrieval 

of 

Data 


Analyzing 

retweets 


1  »  ■  1  1  1  * 


Final  Number 
of  Retweets 


FIG  1:  Methodology 


ALGORITHM  USED: 

SEISMIC: 

Purpose:  For  a  given  post  at  time  t,  predict  its  final  reshare  count 
Input:  Post  resharing  information:  ti  and  ni  for  i  =  0; :  :  :  ;Rt. 

SEISMIC  models  the  information  cascade  as  a  self-exciting  point  process.  In  a  self-exciting  point  process,  each 
reshare  not  only  increases  the  cumulative  count  by  one,  it  also  exposes  new  followers  who  may  further  reshare 
the  post.  This  property  is  ideal  to  model  the  "rich  get  richer"  phenomenon  in  information  spreading. 
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SEISMIC  implements  a  fast  kernel  weighted  method  to  estimate  the  temporally  evolving  infectiousness,  which 
fully  characterizes  an  information  cascade.  Roughly  speaking,  it  measures  how  likely  the  post  will  be  reshared 
at  that  time.  Then,  if  the  infectiousness  is  smaller  than  a  threshold,  SEISMIC  can  accurately  predicts  the  final 
popularity  of  the  post. 

NETWORK  DIAGRAM: 


VisvyaMetfia 


Fig  2:  Network  Diagram 


BETWEENESS  CENTRALITY: 

In  graph  theory,  betweeness  centrality  is  a  measure  of  centrality  in  a  graph  based  on  shortest  paths. 

CODE: 


DISTANCE-WEIGHTED  BETWEENNESS: 


DW 

Betweenness 

1 

rashishetty764 

5.379 

2 

Aditya741 15170 

12.303 

3 

Aditya741 15170 

41.807 

4 

Aditya741 15170 

5.695 

5 

sujithindia 

3.020 

6 

mateguy26 

12.400 

7 

NarendraModi  1 FC 

12.512 

8 

SupportModi 

0 

9 

SupportModi 

0 
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10 

universalthamil 

0 

11 

Godse7Shankar 

0 

12 

SupportModi_ 

0 

EXPLANATION: 

This  is  a  simple  stochastic  algorithm  to  generate  a  graph.  It  is  a  discrete  time  step  model  and  in  each  time  step  a 
single  vertex  is  added.  We  start  with  a  single  vertex  and  no  edges  in  the  first  time  step.  Then  we  add  one  vertex 
in  each  time  step  and  the  new  vertex  initiates  some  edges  to  old  vertices.  The  probability  that  an  old  vertex  is 
chosen  is  given  by 

P[i]  ~  k  a  i  +  a 

sample_pa  generates  a  directed  graph  by  default,  set  directed  to  FALSE  to  generate  an  undirected  graph.  Note 
that  even  if  an  undirected  graph  is  generated  ki  denotes  the  number  of  adjacent  edges  not  initiated  by  the  vertex 
itself  and  not  the  total  (in-  +  out-)  degree  of  the  vertex,  unless  the  out.pref  argument  is  set  to  TRUE. 

OUTPUT: 


R/sem  2/SIWworkspace/Twitter/Twit  -  RStudio 
Pile  Edit  Code  View  Plots  Session  Build  Debug  Tools  Help 


g.-  a-  3 

*  Addins » 

*  Twit  - 

j  narendramodi  x 

=n 

Environment  History 

Filter 

(0, 

0  JP Import  Dataset-  {  List*  (§ 

id  replyToUID 

statusSource 

screenName 

retweetCount 

isRetweet 

retweeted 

longit 

"?  Global  Environment* 

1 8.401 629e-f-1 7  NA 

<a  href=*http://www.hootsuite.com'  rel=‘nofollow'>... 

rashishetty764 

0 

FALSE 

FALSE 

A 

©narendramodi  100  obs.  of  16  variables 

8.401 603e+1 7 

<a  href='http://twitter.com/download/android'  rel=V. 

Aditya741 15170 

17 

TRUE 

FALSE 

74  I 

pred  num  [1:361,  1]  0  inf  inf  inf  I...  £1 

retweet er_p...  int  [1:2,  1:2]  10  10 
©sap  10  obs.  of  16  variables 

©trump_tweet...  200  obs.  of  1  variable 
©tweet  15563  obs.  of  2  variables 

©tweetsdf  100  obs.  of  16  variables 

8.401 598e+17|wA 

<a  href=’http://twitter.com/download/android'  rel=* ... 

Aditya741 1 51 70 

21 

TRUE 

FALSE 

8.401 589e+l  7 

<a  href=’http://twitter.com/download/android'  rel=’ ... 

Aditya741 15170 

17 

TRUE 

FALSE 

8.401 588e+l  7 

<a  href=*http://twitter.com/download/android'  rel='... 

sujithindia 

0 

FALSE 

FALSE 

NA 

8.401 585e+1 7 

<a  href=*http://twitter.com/download/android’  rel=*. . 

mateguy26 

388 

TRUE 

FALSE 

8.401 577e+1 7  74 

<a  href="https://dlvrit.com/"  rel="nofollow">dlvr.it</a> 

NarendraModilFC 

0 

FALSE 

FALSE 

©VR _ 85  obs.  of  16  variables 

8.401 572e+l  7 

<a  h  ref=*  http  s  ://ifttt.  com  ’  rel='nofollow'>lFTTT</a> 

SupportModi. 

0 

FALSE 

FALSE 

Files  Plots  Packages  Help  Viewer 

|  fll  Install  0  Update  H  Packrat 

8.401 572e+17|MA 

<a  h ref=" http s ://ifttt. com  *  rel='nofollow‘>lFTTT</a> 

SupportModi. 

0 

FALSE 

FALSE 

8.401 555e+17[MA 

<a  href=*http://www.hootsuite.com*  rel=’nofollow'>... 

universalthamil 

0 

FALSE 

FALSE 

Name  Description  Version 

8.401 547e+l  7 

<a  h  ref=”  http  ://twi  tte  r.  co  m  ’  rel=’nofollow’>Twitter  W... 

Godse7Shankar 

16 

TRUE 

FALSE 

User  Library 

8.401 531  e+1 7 

<a  href=*https://ifttt.com‘  rel='nofollow’>iFTTT</a> 

SupportModi. 

0 

FALSE 

FALSE 

[]  arules  Mining  Association  Rules  and  1.5-0  £ 

8.401 504e+17[w4 

<a  href=’http://twitter.com'  rel="nofollow'>Twitter  W... 

Godse7Shankar 

17 

TRUE 

FALSE 

Frequent  Itemsets 

8.40 1 496e+l  7 

<a  href='http://twitter.com/download/android'  rel=' ... 

Girishg47500801 

1 

TRUE 

FALSE 

V 

[]  arulesViz  Visualizing  Association  Rules  and  1,2-0  £ 

Frequent  Itemsets 

|< 

> 

n  assertthat  Easy  pre  and  post  assertions.  0.1 

Showing  1  to  1 5  of  1 00  entries 

Console  F:/sem  2/SNA/workspace/Twitter/Twit/  O 

by  #narendramodi  via  @cOnvey" 

>  write. csv(tweetsdf,  file='F:/sem  2/SNA/datasets/narendramodi.csv’ ,  row. names=F) 

>  rt_patterns  =  grep("(RT|via)((?:\\b\\w*@\\wt)+)",tweet_txt,  ignore. case=TRUE) 

>  tweet.txt [rt_patterns] 
character (0) 

>  rt_patterns 
integer (0) 

>  narendramodi  <-  read.csv("F:/sem  2/SNA/datasets/narendramodi.csv",  comment. char="#") 

>  view(narendramodi) 

>1 
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FIG  3:  RSTUDIO 


The  above  screen  shot  is  a  retrieval  of  retweets  with  the  hashtag  narendramodi 
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Q  Exploratory 
Exploratory  Tools  Help 


l<  narendramodi 
Click  Add  button  or  column  menu  to  transform  data... 


Command 

09:05:56 


■■  Summary 


|  Table  lilH  Viz 


C  Reload 


statusSource  v 

character 

<a  href="https://ifttt.com"rel="nofo... 
<a  href="http://twitter.com/downlo. 
<a  href="http://linkis.com"  rel="nof.  . 
<a  href="http://twitter.com '  rel="no. . 
<a  hre^"http://www.hootsuite.com ... 
<a  href="http://twitter.com/downlo. 
(Other)  (10) 


Average  8.' 

screenName 

character 

SupportModi_  (13) 
VisvyaMedia  (7) 
bhuvimore  (4) 
Godse7Shankar(4) 
NarendraModilFC  (4) 
Aditya74115170(3) 
(Other)  (65) 


4.2330e+8 

8.3660e+17 

0 

388 

NA 

98  (98.00%) 

NA 

0  (0.00%) 

NA 

0  (0.00%) 

NA 

0  (0.00%) 

Min 

4.2330e+8 

Unique 

12 

Unique 

63 

Min 

0 

Max 

8.3660e+17 

Min  Length 

52 

Min  Length 

6 

Max 

388 

Median 

4.1830e+17 

Max  Length 

84 

Max  Length 

15 

Median 

0 

longitude 

character 


latitude 

character 


Source 

Local  -  CSV 


$  = 


16  Columns  100  Rows 
Average  8.401  Oe+17  * 


retweetCount 

integer 


FIG  4:  EXPLORATORY  ANALYSIS 

The  above  screen  shot  is  a  summary  of  retweets 


[3  Exploratory 
Exploratory  Tools  Help 


-OX 


|<  narendramodi 


Click  Add  button  or  column  menu  to  transform  data. 


►  X 

16  Columns  100  Rows 

More  ▼  *  Pin  / 


Source 

Local  -  CSV 


ft  &  n  ®  x  ft 


LIG  5:  EXPLORATORY  ANALYSIS 


From  here  we  infer  the  graphical  data  of  the  retweets 

CONCLUSION 

Our  approach  provides  a  theoretical  framework  for 
explaining  temporal  patterns  ofinformation 
cascades. SEISMIC  is  both  scalable  and  accurate.  The 


model  requires  no  feature  engineering  and  scales 
linearly  with  the  number  of  observed  reshares  of  a 
given  post.  This  provides  a  way  to  predict  information 
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spread  for  millions  of  posts  in  an  online  real-time 
setting.  SEISMIC  brings  extra  flexibility  to  estimation 
and  rediction  tasks  as  it  requires  minimal  knowledge 
about  the  information  cascade  as  well  as  the 
underlying  network  structure.  Thus  the  future 
enhancement  could  be  come  up  with  even  more  less 
relative  error. 

ACKNOWLEDGEMENTS: 

This  research  has  been  supported  by  ucinet, 
exploratory,  Rstudio. 

RESULT: 

The  output  is  basically  where  the  total  number  of 
retweets  is  analyzed,  and  then  the  Information  cascade 
has  been  predicted.  Below  are  some  screenshots  after 
implementing  cascade. 


FIG  6:  CASCADE  CURVE 
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FUTURE  ENHANCEMENT: 

>  Enhancing  the  retweet  count  even  more  with  less 
relative  error. 

>  Improvizing  the  efficiency  of  mining. 
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